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Abstract 

Media sharing applications, such as Flickr and Panoramio, contain a large 
amount of pictures related to real life events. For this reason, the development of 
effective methods to retrieve these pictures is important, but still a challenging 
task. Recognizing this importance, and to improve the retrieval effectiveness of 
tag-based event retrieval systems, we propose a new method to extract a set of 
geographical tag features from raw geo-spatial profiles of user tags. The main 
idea is to use these features to select the best expansion terms in a machine 
learning-based query expansion approach. Specifically, we apply rigorous sta¬ 
tistical exploratory analysis of spatial point patterns to extract the geo-spatial 
features. We use the features both to summarize the spatial characteristics of 
the spatial distribution of a single term, and to determine the similarity be¬ 
tween the spatial profiles of two terms - i.e., term-to-term spatial similarity. 
To further improve our approach, we investigate the effect of combining our 
geo-spatial features with temporal features on choosing the expansion terms. 
To evaluate our method, we perform several experiments, including well-known 
feature analyses. Such analyses show how much our proposed geo-spatial fea¬ 
tures contribute to improve the overall retrieval performance. The results from 
our experiments demonstrate the effectiveness and viability of our method. 

Keywords: Information Retrieval, Spatial Prohle, Tag Relatedness, Query 
Expansion, Event Retrieval, Social Media Retrieval 


1. Introduction 

The proliferation of web and social media-based photo sharing has not only 
opened many possibilities but also resulted in new needs and new challenges. 
Despite recent developments and technological advances within - e.g., web-based 
media sharing applications, the continuously increasing amount of available in¬ 
formation has made the access to these photos still a demanding task. In general, 
we can address this challenge by allowing the photo collections to be organized 
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and browsed through the concept of event mm- Also, most users are gen¬ 
erally familiar with searching photo collections using events as starting points. 
Thus, aiming at supporting the detection and search of event-related photos, we 
propose an event retrieval framework to improve the state-of-the-art in real-life 
event retrieval systems in term of retrieval effectiveness. 

Focusing on media sharing applications, an event refers to ’’something hap¬ 
pening in a specific place at a specific time, and tagged with a specific term” [2] . 
With an event-retrieval system, we can assume two types of scenarios: (1) A user 
directly retrieves media resources related to a particular event; and (2) a user 
uses a given tagged photo representing an event to retrieve other photos related 
to any similar events from a large image collection. In this work, we mainly 
focus on scenario (2). Due to their characteristics, pictures in photo sharing 
applications such as and PanoramiJ^ are particularly interesting. For 

example, most of such pictures are accompanied with contextual metadata and 
other related information added by users, such as Title^ Tags, Description, tem¬ 
poral information represented by the picture capture and upload times, and 
geolocation. Hence, with photo sharing applications in mind, we study how we 
can exploit contextual metadata to retrieve event-related pictures. 

The main goals of this work are (1) to build a framework to extract a set of 
geographical features from geographical raw data of documents or pictures, and 
(2) to develop an approach to allow effective retrieval of event-based images. 
Specifically, we develop a set of geographical features that can capture the char¬ 
acteristics of the geographical distributions of social (or user) tags. Further, we 
investigate how we can combine these features with the state-of-the-art temporal 
features to improve the retrieval performance of an event-based image retrieval 
system. Finally, we explore integrating a machine-learning-based approach with 
our retrieval system. We study how these features can be used in a query ex¬ 
pansion framework. Here, we are especially interested in the contributions of 
the features on the selection of expansion terms from feedback documents. 

To this end, we propose a novel framework that improves the retrieval ef¬ 
fectiveness of tag-based image search by including the geographical profile of 
terms. We have developed a new method for extracting spatial features us¬ 
ing information about the geographical distribution of tags. Our main idea is 
to use such features to characterize the clustering tendency of tag terms and 
the geographical correlation between two geographical distributions of two tags. 
Spatio-temporal information retrieval is an established field already. However, 
existing approaches have mainly been concerned with point-of-interests (POI) 
extraction [3] and trajectory mining^. With the constantly increasing number 
of geotagged pictures - e.g., in Flick]0 exploring the raw geographical metadata 
has become increasingly important. 

In summary, the main contributions of this paper are as follows. First, we 


^See http://www.flickr.com/ 

^See http://www.paiioramio.com/ 

^Around 220M of Flickr pictures are geotagged. See also http://www.flickr.com/map/ 
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propose a new robust set of geographical features that can be used (1) to deter¬ 
mine the clustering tendency of tags by analysing the geographical structure of 
their geographical distribution, and (2) to analyse the tag-relatedness between 
two tags by exploring the correlation between the geographical distributions. 
To do this, we have developed new measures derived from a well-founded Ex¬ 
ploratory Analysis theory from Statistics. More specifically, we adapt the Rip¬ 
ley’s K-function and Ripley Cross-K function (A'-function and cross-AT func¬ 
tion for short) [5] as part of our approach to extract the geographical features. 
Second, we show how our features can be incorporated in a machine learning- 
based query expansion model to improve the ability to select good expansion 
terms. In addition, we demonstrate how these features can be combined with 
existing document-based approaches and temporal features to achieve improved 
retrieval performance. Third, through our experimental evaluation we show the 
effectiveness and practical feasibility of our approach. This includes comparing 
with both baseline retrieval models and baseline approaches for geo-temporal 
tag-relatedness. Fourth, we perform a thorough analysis to show the effective¬ 
ness of our proposed geographical features - in the afore-mentioned machine 
learning-based query expansion process. 

The rest of this paper is organized as follows. To put our research in a per¬ 
spective Section [^provides an overview of approaches related to our work. Sec¬ 
tion gives an overview of the preliminary theory underlying our approach and 
defines the problems addressed in this paper. Section presents our proposed 
geographical features and explains how we extract them. Sectionj^describes our 
framework applying these features in a learning-based re-weighting process for 
a query expansion model. Section explains our experimental setup. Section 
presents the results from our experiments. Finally, in Sectionj^we conclude the 
paper and outline our future work. 

2. Related Work 

In the past decades, detection of events from textual document streams and 
databases has been treated extensively in the literature ill!. However, although 
mining and retrieving pictures related to real-life events is an active field, it is 
still not a fully mature research domain [iiHiin]. Most related approaches have 
been aimed at extracting events from different types of datasets. To the best of 
our knowledge, only few works have addressed the problems of retrieval of events 
in connection to media sharing, and many of these approaches were presented 
in the Social Event Detection (SED) task at MediaEval Qb , where the main 
objective was to propose event retrieval systems for Flickr pictures. 

A research area closely related to ours is pseudo-relevance feedback. Gener¬ 
ally speaking, pseudo-relevance feedback refers to techniques to average top- 
retrieved documents to automatically expand an initial query. It has been 
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studied widely in information retrieval both to extend existing retrieval mod¬ 
els [iniiiiiiiaiia, and as part of query expansion frameworks mils]. Specifi¬ 
cally, Lavrenko and Croft [TO] and Zhai and Lafferty m propose two methods - 
the Relevance Model and the Mixture Model, respectively - to include feedback 
information in the Kullback-Leibler (KL) divergence retrieval model [TB]- The 
idea is to estimate a new query model using terms in the top-A: retrieved doc¬ 
uments, also called pseudo-relevant feedback documents to update an existing 
query model. Experiments have shown that these approaches are indeed able to 
improve the standard retrieval models with respect to retrieval effectiveness m- 
This has also been the main motivation for including them in our study. 

Cao et al. m present a classification approach to automatically select good 
expansion terms from a set of candidate terms from the pseudo-relevant docu¬ 
ments. To do this, they train a classifier using a set of good and bad candidate 
expansion terms represented by feature vectors. Such feature vectors consist of 
traditional statistical features based on the distribution of the terms both in 
the whole collection, and the set of (pseudo) relevant documents. Lin et al. [IB] 
propose an extension of this work by applying a learning-to-rank approach for 
training and classifying the candidate expansion terms. They show that they 
can improve the retrieval effectiveness by using social annotation from external 
tagged resources, such as the de.li.cio. unsocial bookmarking web service, 
as a source for extracting useful expansion terms. The use of social annotation 
as source for improving the retrieval performance has also previously been in¬ 
vestigated by Zhou et al. m- These approaches are related to ours in that we 
also use classification to select good expansion terms. Their main differences 
with our approach are that none of them applies either temporal, geo-spatial or 
geo-spatio-temporal features. 

As discussed later in this paper, we are interested in investigating the contri¬ 
butions of the temporal characteristics of a term in a pseudo relevance feedback 
context. Within event retrieval, the usefulness of temporal information is evi¬ 
dent. Also within general information retrieval, results from existing work have 
proven its usefulness. For example, Dakka et al. [20] and Jones and Diaz m 
show how the temporal profile of queries can be used to improve existing re¬ 
trieval models; whereas Keikha et al. [22] and Whiting et al. [23] propose new 
temporal-based approaches to improve pseudo relevance feedback based models. 
Nevertheless, while existing approaches seem to have focused on the temporal 
aspects only, to fully support event retrieval, we stress the necessity of the spa¬ 
tial profile of social tags, as well as the temporal profile. To the best of our 
knowledge, the combination of both temporal and spatial features of social tags 
to improve the retrieval effectiveness has still not been sufficiently investigated. 
Only few methods - e.g., [211125], incorporate temporal and spatial correlation 
measures to compute term-to-term relatedness. Specifically, Radinsky et al. [24] 
propose a method to improve the semantic relatedness measure of two terms by 
capturing the correlation between the temporal profiles of tags and concepts 
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associated with the two terms. Zhang et al. [5^, on the other hand, analyse the 
tag relatedness by using different correlation measures, based on spatial and 
temporal co-occurrence. In summary, although these approaches are related 
ours, the way we extract the spatial profiles of tags and apply them in combi¬ 
nation with the temporal profile is different. Also, while these approaches were 
originally developed for textual documents containing much term redundancy 
that can normally carry the document semantics, image tags usually consist 
of few unique terms. This makes it more challenging to derive term-based se¬ 
mantic relatedness for image retrieval in general [26) . thus further proving the 
usefulness of our approaches. 

3. Preliminary 

In this section, we first describe the data our approach is based on and define 
the problem we address. Thereafter, we give an overview the statistical method 
our approach are built on. 

3.1. Data and Problem Definition 

This work mainly focuses on media sharing applications, where resources 
are usually tagged with terms - i.e., tags, that describe the content of the re¬ 
sources. Such resources may also have information specifying their geographical 
locations, expressed in longitude and latitude values, and are referred to as 
geotagged resources. 

Let V = {Pi,..., Pjv} be a set containing N = I'D] resources. Then, assume 
that each resource Pi can be annotated with a set of tag Ti, a temporal times¬ 
tamp ti and a geotag g, = {latitude, longitude), such that Pi = {gi,Ti,Ti}, 
i = 1,... ,N. Without loss of generality, we assume our resources to be a set 
geotagged pictures downloaded from Flickr, that may or may not contain all 
of the above information at the same time. Further, let £ = {Ei,..., Em}, 
M = \£\, be a set of picture clusters Ei = [Pj^ ,..., Pj^, }, i = 1,..., M, each 
of which contains images related to the same event. To make our approach 
as general as possible, we assume that a query picture has only a set of tex¬ 
tual tag terms - i.e., it does not contain any geotags or temporal timestamps. 
This means that following our setup above, a query picture related to an event 
Ei^ G £ can be expressed as Pj^ = {7},} - i.e., and Tj^ are not included. For 
simplicity, we will use Q to denote a query picture - i.e., = Q = {<li, ■ ■ ■, Qn}, 

where n = \ Q\ and qi,i = 1,..., n are query tag terms. 

The problem addressed in this paper concerns how we can effectively retrieve 
event-related pictures with a query Q, using only the textual tags. First, we in¬ 
vestigate how current state-the-art information retrieval methods perform when 
applied on our dataset, and let the methods serve as the baseline for our exper¬ 
imental evaluation. Second, we study how a query expansion framework using 
a set of spatial features summarizing the spatial statistics of the distribution 
related to a tag, and a set of features defining geographical relatedness between 
two tags can help us improve the retrieval effectiveness. Third and finally, we 
compare our method with the baseline methods. 
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3.2. Exploring Interaction between Spatial Point Patterns 

As mentioned in Section our approach is based on geo-spatial features for 
picture tags. To achieve this, we have to build a spatial profile for each tag. 

Assume now we have a large dataset VCD containing L = |T>| geotagged 
pictures - i.e., V = {Pi,..., Pl} and Pi = {gi,Ti},i = 1,...,L. Further, let 
V = {tci,..., ww} be the vocabulary with size W = |V| of the set of social tags 
used to annotate V. Then, to extract the spatial features from each tag Wi G W, 
we analyse the spatial characteristics for the tags using statistical exploratory 
analysis m- 

To be able to use and understand the ideas of exploratory analysis translated 
into our domain, we need to establish two important concepts our approach is 
founded on: picture point processes and tag point pattern. First, considering 
Flickr pictures as our geotagged web resources, we model the spatial distribution 
of pictures taken in a specific geographical area as picture point processes, which 
is formally defined as follows: 

Definition 3.1 (Picture Point Process). A Picture Point Process is a point 
process modelling the spatial distribution of pictures taken in a 2-dimensional 
study region TZ^. So, any realization of the random variable, V, modelling the 
Picture Point Process is called Picture Point Pattern. 


Second, for each term Wi € V, we can assume that we have a set of points 
representing the spatial distribution of the tags in a studied region. With this 
assumption, we derive a so-called Tag Point Pattern from Definition 3.1 as: 


Definition 3.2 (Tag Point Pattern). A Tag Point Pattern - or just Vi 
for simplicity - for a tag term Wi is a subset of a Picture Point Pattern V, and 
is a set consisting of the geographical positions of pictures annotated with Wi. 


With these definitions, we can now use statistical exploratory analysis to 
derive the geo-spatial characteristics of image tags. More specifically, we use 
a tool called multivariate Ripley K-function to get the geo-spatial features 
from the tags. It is used to study the interaction between two or more spatial 
point patterns. To help understand how this is done, below is a brief overview 
of the multivariate Ripley’s K-function. 


3.2.1. Multivariate Ripley’s K-Function 

The Ripley’s A-function is mainly a tool for analyzing completely mapped 
spatial point patterns data in a two-dimensional space Hence, it can be used 
to determine the spatial distribution patterns of objects in spaces. 

Let h denote a distance and A be the intensity of a spatial point pattern, 
then Ripley’s A-function, K{h), is defined as [5]: 

K[h) = \~^E[fi= other points within distance h of an arbitrary point] (1) 

The multivariate Ripley’s Kij(h) function is a generalization of K{h), and 
is used to analyse the characteristics of an isotropic spatial point process. It 
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contains information about clustering and dispersion of point patterns at dif¬ 
ferent distance scales h. The multivariate form aims at answering questions 
regarding the interaction between two or more point patterns - i.e., bivariate or 
multivariate point patterns. It is specified as follows Let and Xj be the 
intensity of the spatial point patterns Vi and Vj., and assume and Xj being 
constant throughout TZ"^. Then, 

Kij{h) = A“^i?[# points of type i within distance h 

from an arbitrary point j]. (2) 

Translated to our application, a point here would be a geographical position of a 
picture. Restricting to the case of two point patterns, we have four K functions: 
two self-Kfunctions Kii{h), K 22 {h), and two cross-K functions Ki 2 {h), K 2 i{h). 
The following is most used estimation of Kij (h ), as proposed by Ripley [5] : 

= (3) 

XiXjA ^ j 

where di^j^ is the distance between a fc-th point of type i and a /-th observed 
point of type j. Ih(di^,jf) is an indicator, such that Ih(,di^jf) = 1, if di^j^ < h; 
and Ih{di^jj) = 0, otherwise. Xi = Ui/A and Xj = nj/A are the intensity of the 
two spatial point patterns as the rate between the number of points and the 
considered area A. 

The above four Kij functions are used in the exploratory analysis to study 
the relationship between two spatial point patterns. For example, in the inde¬ 
pendence approach proposed by Lotwick and Silverman [55], the null model as¬ 
sume that two spatial point patterns are generated by two different and indepen¬ 
dent spatial processes. Under this independence assumption, with the bivariate 
form or the cross-/T function, Ki 2 {h) = From this, the empirical/estimated 
cross-if function Kij{h) calculated on the spatial point patterns, Vi and Vj, can 
be compared with the null model to determine the distribution characteristics 
between the two point patterns as follows: Attraction, if Kij(h) > 7rh^; spatial 
independence, if Kij(h) = irh^; and repulsion, if Kij(h) < 

3.2.2. Cross-D Function 

As can be derived from the above discussion, Ripley’s cross-if functions are 
useful in characterising the distributions of spatial point patterns. However, the 
graph of the Kij{h) function has normally a parabolic curve, which normally 
makes it less straightforward to interpret. As a result, a so-called L-function is 
often used instead. An L-function is defined as 

(4) 

Using the same assumption of independence of spatial point patterns as 
before, we get Lij{h) = h. As with the if-function, the empirical values of 
Lij[h), Lijih), can be used to characterise tag point patterns Vi and Vj as 
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follows: Lij{h) > h indicates attraction between the point patterns, {h) = h 
shows spatial independence^ whereas Lij{h) < h means repulsion. To further 
facilitate our interpretation, we normalize the cross-L function again to get a 
so-called D-function for two tag point patterns. Based on the empirical cross-L 
function, the D-function is given by 



b,j{h) = U,[h) - h. (5) 

Again, we can use Dij to characterize the two tag point patterns Vi and Vj as 
follows: Dij{h) > 0 indicates attraction between Vt and Vj', Dijih) = 0 means 
we have independence between Vi and Vj', whereas Dij{h) < 0 implies repulsion 
between Vi and Vj. In the rest of the paper, assuming Vi ^ Vj, we refer this 
function to as cross-D function. 

Example [Cross-D function]: To explain our ideas, assume we have "Old 


Figure 1: Spatial distribution of the Tag Point Patterns related to the tag Old Naval College 
and the tag University of Greenwich at two different zooming (a) and (b) 

Royal Naval College" and "University of Greenwich" as two specific tags, 
both referring to areas in London. Then, consider a cross-L function L 12 be¬ 
tween two tag point patterns, Vi and V 2 , as specified in Definition ] 3.2 [ related 
to these two tags, respectively. A general observation is that the University of 
Greenwict0is located within the area of the Old Naval Colleg^ Thus, although 
the tags are syntactically different, they are connected and refer to the same ge¬ 
ographical entity. Within our spatial statistics, this means that pictures tagged 
with "Old Royal Naval College" are spatially attracted to pictures tagged 
with University of Greenwich (See Figure and[^). To further illustrate this 
relatedness, consider the corresponding cross-D function Di 2 {h) in Figure]^ 
varying the values of h between 0 and 2 km. Using the statistical test described 
above, we can check the validity of our observation about the spatial attraction 


®See http://en.Wikipedia.org/wikl/University_of_Greenwich 
^See http://en.Wikipedia.org/wiki/01d_Royal_Naval_College 






among the studied point patterns. As can be seen in Figure the graph of 
Di 2 {h) (denoted as ’’observed” in the figure) is greater than the upper envelope 
(denoted as ’’higher” in the figure), at all values of Hence, based on our 
distribution ’’rules” we have ’’attraction” between the two point patterns. 



Figure 2: The empirical (observed) cross-D function £> 12 (h) of the tag point patterns for 
Old Naval College and the tag University of Greenwich as a function of distance (in km). 
The confidence envelopes (95%) represented by its upper (higher) and lower borders, for the 
theoretical cross-D function under complete spatial randomness (CSR) are also shown. 

In the following, we elaborate on how we extract our set of features based on 
the spatial characteristics of a tag point pattern, and the interaction between 
two spatial point patterns derived from the cross-D function. 

4. Exploring the Spatial Distribution of Tags 

Recall that the primary goal of this work is to find effective ways to ex¬ 
ploit the spatial characteristics of tags to improve the retrieval performance. To 
achieve this, we investigate applying methods from spatial statistics to explore 
the spatial distribution of tags. In brief, we apply a collection of features de¬ 
rived from the bivariate Ripley’s cross-L function presented in Eq. and the 
Ripley’s L-function for a single tag point pattern. To show how we do this, in 
Section |4.1| we present our method for extraction of general spatial features of 
tags, including both single and term-to-term spatial features. In Section [4^ we 
focus on special tags, such as tags describing point-of-interests, and introduce a 
method to extract the spatial features for such tags. 

4-.1. Single and Term-to-Term Spatial Features 

We divide the spatial characteristics for tag terms into two main classes: 
(1) single-term spatial features, which determine the aggregation tendency of a 
single tag spatial point pattern; and (2) term-to-term spatial similarity features, 
which are related to the geographical similarity between the spatial profiles of 
two considered tags Wi and Wj. In the following we explain how we extract both 
these features. 


®We computed the envelope by simulating the random labelling with a null model and 99 
simulations. 
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Assume we have a scale interval S = [0... i?] in kilometres, and that we 
divide the set of the induced intervals into K discrete and equidistant points 
hk, k = 1,... K. To extract both the single and term-to-term spatial features, 
we will use the £)-function from Section [3.2.1[ estimated over this interval. 

To capture the clustering tendency of tag point patterns for single terms, 
in we introduced two features called I sum and I max estimators. I sum is 
computed by extracting the positive area within the intersection between the 
I?-function and the curve representing the null hypothesis; whereas Imax is the 
maximum distance between the D-function and the null hypothesis curve. If 
we assume that Di represents the estimated value of our D-function for a tag 
point pattern pi. Then, for a given tag term Wi, 


isuM(wi) = 
iMAxiWi) = 


yK 

l^k=l 


Di(hk) 


max 


y/Var(bi{hk)) 
b^{hk) 


and 


k=l....K \^/Var(bi{hk)) 


( 6 ) 

(7) 


In other words, IsuM{wi) is computed by summing the difference between the 
I?-function and the null hypothesis. A high value of IsuM{wi) means that 
there is a strong aggregation among pictures that are annotated with Wi and 
connected to the tag point pattern pi. Further, iMAxiwi) is calculated by 
estimating the maximum normalized distance between the /^-function and the 
null hypothesis. Hence, it determines the highest positive difference between 
the AT-function of the tag point pattern that the H-function was derived from 
and the null hypothesis. A high value of lMAx{wi) means that the tag point 
pattern pi contributes to a high degree of clustering. 

For the bivariate case, we can do similar estimation of the attraction ten¬ 
dency of two tag point patterns as follows: 


IsUM{Wi,Wj)= J2k=l 
lMAxiwi,Wj)= max 


Djj jhk) 

y/Var{bij{hk)) 
Djj i^k ) 


and 


I ! -^- 

k=l,...K \ y/Var(Dij(hk)) 


( 8 ) 

(9) 


where Wi and wj are two specific tags with their tag point pattern pi and pj. 

Our initial studies have shown the potentials and the usefulness of the above 
estimators |29j . To apply them in retrieval settings, however, we have to make 
them more generic, and introduce two new concepts: the Relative Discrete Pos¬ 
itive Area (RDPA) and Relative Discrete Maximum Distance (RDMD). The 
main idea is to extend IsuM{wi), IsuM{wi,Wj), Imax^w^) and lMAx{wi,Wj) 
by including their behaviour at different scales, and not only at a fixed scale. 
So, let gsum denote the function representing the relative discrete positive area 
between the H-function and the null hypothesis in a given scale interval, and 
assume gMax represents the maximum distance within the same considered in- 
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terval. Then, gsum and guax are computed as follows: 


gSumig^i: \j^fi ^g\) — 
gMaxig^i: [^/: ^g\) — 


EL. 


max 1 , 

k=f,...g \\/Var(Di(hk)) 


Dijhk) 

^JVar(bi{hk)) 
bijhk) 


and 


( 10 ) 

( 11 ) 


where / and g, with f < g, are two indexes related to two points hf and hg of 
the scale interval S. Note that if / = 1 and g = K, then gsum{wi, [hf,hg]) = 
IsuM{wi) and gMaxiwi-, [hf, hg\) = lMAx(wi)- In conclusion, the generalization 
captures more features, which divide and summarize the spatial characteristics 
over more sub-intervals within the original scale interval. 

For the bivariate case, we apply a similar approach, and compute gsum{wi, 
Wj, [hf, hg]) and gMax(wi, Wj, [hf, hg]) by replacing the Di function with Dij. 


4-2. N-order Spatial Features 

The features in Eq.[T0|and Eq. [^estimate the deviation of the Z?-function of 
the tag point pattern (or the two tag point patterns) from the null hypothesis 
- i.e., the spatial randomness for a single tag point pattern, and the spatial 
independence between two tag point patterns, respectively. In addition to this, 
in our study we observed that for some tags representing point-of-interests, the 
curve of the H-function related to a tag point pattern tends to be steeper within 
a short scale sub-interval. Therefore, to also capture such a characteristic, we 
propose a set of features, called first order spatial features that can extract the 
information on the shape of the curve of the Z?-function. In Geometry, the 
derivative f'{x) of a source function f{x) can generally be used to determine 
the slope coefficient of the tangent of the source curve at a point x. Using this as 
a starting point, our idea is to analyse the derivative function of the Z?-function 
for each sub-interval. Since the D-function is discrete over the scale values hk, 
k = 1,..., K, we apply the discrete equivalent of the derivative function, or 
more specifically the forward finite difference [50] i as follows: 

I)[{h) = AfrnDi{h) = Di{hi) - Dt{hm), yhi < hm, (12) 


where hi and hm are two specihc scale points. Note that the value of D[{h) is 
positive at each scale point where the ZD-function increases, but negative at all 
scale points where ZZ-function decreases. Moreover, the higher the positive value 
of the ZZ'(/i) is, the more the intensity of the function increases. Finally, for the 
bivariate form of the ZZ-function, we can compute the derivative of Dij{h) as 
D'ij{h) by extending Eq. 


12 


to take into account both Wi and Wi 


Besides determining me slope of the ZZ-function, we are also interested in 
knowing about the concavity of this function at some point x. This gives us more 
information about the structure or the shape of the function, thus providing 
us more spatial features. We call such features second order spatial features, 
which we get by doing further derivation of the function D[{h). As before, we 
estimate the resulting D"i{h) function by hnite differences. This means that we 
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can extract the spatial features from b[{h), b'^j{h) and b"i{h), b”ij{h) using 
the positive area and the maximum distance estimators in Eq. and Eq. |11| 
In the next section, we show how the spatial features presented above are 
useful, especially when used in a query expansion framework for event-based 
image retrieval. 


5. Query Expansion Framework 

Query expansion techniques have been one of the most studied approaches 
within the information retrieval field since the work by Maron and Kuhns |31] . 
However, new application areas have made query expansion still needed in or¬ 
der to improve the retrieval effectiveness [32j . Nevertheless, reinventing query 
expansion techniques is not the focus of this work, per se. Rather, we use it 
as a framework to evaluate the effectiveness of our proposed method on event- 
related image retrieval. In this section, we specifically elaborate on how we use 
our proposed spatial features within a query expansion framework. In addition, 
we explain how spatial features can be combined with temporal features for 
better retrieval performance. 


5.1. Overview of the Kullback-Leibler Expansion Model 

A general query expansion model is a post-processing step in a retrieval 
system that expand and re-weight an original query with terms from top-fc re¬ 
trieved documents that are assumed to be pseudo-relevant. Such top-fc retrieved 
documents are also called feedback documents. 

The Kullback-Leibler divergence-based approach (or just KL-divergence) is 
a query expansion approach that has been proven effective focusing on retrieval 
performance m- The main idea with KL-divergence is to analyse the term 
distributions, and maximize the divergence between the distribution of the terms 
from the top-A: retrieved documents and the distribution of terms over the entire 
collection. The terms chosen for the query expansion are those contributing 
to the highest divergence - i.e., the terms having the highest so-called KL- 
scores M- To compute the KL-score for a specific term t in the feedback 
documents, the following equation is used M- 


KL = PRel{t) log 


PReljt) 

_PCou{t)_ 


(13) 


where PRei (t) and PcoU (A) are the probability that t appears in the top-fc docu¬ 
ments and the collection, respectively. PrsI {t) can be estimated by the normal¬ 
ized term frequency of t in the top-fc documents, while Pcou{t) can be computed 
as the normalized frequency of t in the entire collection. This also means that 
using Eq. |13[ terms with low probability in the entire collection and high prob¬ 
ability on the retrieved top-k documents have the highest KL-score. 

After the expansion terms have been selected, we can proceed to re-weighting 
the query terms. A classical approach for this is the Rocchio’s algorithm m 
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(14) 


using the Rocchio’s Beta equation [33], given by: 


W(tq) 


max tjq max w 


where w{tq) denotes the new weight of a term tq of the query, w{tq) is the weight 
from the expansion model - i.e., KLoiv{tq), maxtu is the maximum weight from 
the expanded weight model, maxtfq is the maximum term frequency in the 
query, and tfq^^ denotes the frequency of the term in the query. 

Since KL divergence is currently one of the state-of-the-art query expansion 
approaches, it has been the natural baseline approach for our experiments. 


5.2. Learning-based Query Expansion Framework 

As can be inferred from our discussion in previous sections, the approach 
proposed in this paper concerns using geo-spatio temporal features in query 
expansion frameworks. In particular, we develop a learning-based approach to 
choose good expansion terms and maximize the retrieval performance. 


Offline Processing 



Figure 3: Overview of our supervised learning-based query expansion framework 

Figure]^ shows the principle behind our approach. As shown in this figure, 
the process is divided into two main parts consisting of an offline processing 
and an online search module. In the offline part, we mainly focus on building 
a classification model for selecting good candidate terms. In the online part, 
on the other hand, the main focus is on using the model in a search context to 
select the actual - previously unseen - candidate expansion terms. Algorithm]^ 
summarizes the steps in the query expansion (QE) process. 

An important question is: how do we select the candidate expansion terms? 
To answer this question, recall Q = {gi,.is query consisting of n terms, 
and S = {ei,..., Cm} denote the set of candidate terms for the query expansion 
process. A good candidate expansion term is a term that improves the re¬ 
trieval performance of the original query Q. Building on a similar principle as 
the approach in [13j . we find by computing the improvements in the average 
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Algorithm 1 Query expansion procedure 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 


Run query Q applying ranking model r 
Get the set D of top-N relevant docs 

Extract unique tags from D and get the candidate expansion term set S 
for Cj € £ do 

X •(— ExtractTermFeats{ej,Q) > Sec. 

y •(— ExtractTemporalFeats{ej, Q) > Sec. 

Z ^ ExtractGeoFeats{ej, Q) > Sec. |5.2.11 Sec. 


5.2.1 


5.2.1 


5.2.2 


0 Sec. |5.2.3[ FigT]! 


Calculate confidence value Conf 

Combine KL score and confidence value Conf in a single score 
^^Finai(^QXj^ ^ Sec. 5.2.3 


end for 

Rank Cj € £ terms according to KLpinaiisj) —>■ ^Rank 


Re-bnild Q with the top-k terms from Suank Q 
Run Query Q by using ranking model r 


precisions (AP). The idea is as follows. First, for any we compute the aver¬ 
age precisions gained from running the original query Q. We call this AP{Q). 
Then, we calculate AP(Q + Ci), which is average precision for the query we get 
from expanding Q with a specific candidate term expansion e^. Finally, we find 
out the improvement in term of average precision from the original query to the 
expanded on by computing 


APdiffiQ, Si) 


AP{Q + e,)-AP{Q) 
AP{Q) 


(15) 


In other words, a candidate expansion term Ci is a good term if APdiff[Q,ei) 
is positive. Otherwise, it is considered as a bad term. In practice, a threshold 
9 is used to control the difference value, such that APdiff{Q,ei) > 9 means we 
have a good expansion term, whereas APdiff{Q, eQ < 9 means is a bad term. 
Cao et al. [T3] suggest 9 = 0.005 as the default threshold. However, because 
the application area of m is mainly different from ours, we decided to do an 
empirical study with different classification algorithms to find the optimal value 
of 9 (see Section 7.1). 

To perform the actual term selection, we define the selection task as a binary 
classification problem. The main idea is to learn a classifier to discriminate the 
good expansion terms from the bad ones. Thus, we use Eq. 15 as a basis for the 
learning process, and to define the positive examples for the classifier. As we will 
discuss in Section the main advantage with this approach is its effectiveness. 
However, to achieve good results, selection of features is a crucial task. Below, 
we discuss how we select the feature set that can be used to represent each 
expansion term e^. Thereafter, we explain how we use a classifier to compute 
a confidence value as function of e^, as part of the retrieval process. Finally, 
we present a way to combine this value with the baseline KL-score of the same 
candidate term to re-rank the set of candidate expansion terms £. 
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5.2.1. Selecting the Feature Set 

Selecting the right set of features has a direct impact on the accuracy of a 
classification algorithm. This is also one of the reasons we emphasize the impor¬ 
tance of studying the effects of selection of features in the end (retrieval) results. 
To learn a classifier, we define a vector of features for each candidate expansion 
term e from the top-fc retrieved items, given a query Q = {gi, < 72 , ■ ■ ■, g-a}- Ta¬ 
ble lists the features we study in this work. We group them into three sets of 
features: term, temporal and spatial features. Since our focus is on event-based 
retrieval, this calls for features beyond those describing document contents only. 


Term Features (T): The term features consist of features that are used to 
characterize a document content. They are chosen based on the hypothesis that 
terms that contribute to improve the retrieval effectiveness are those being most 
frequent and distinctive [13] ■ Existing studies suggest using features related to 
the distribution of the candidate term e in the feedback documents and the 
whole collection, and those capturing the co-occurrence of e with the terms in 
the original query Q [HEH]. It is, however, worth noting that these features 
has mainly been applied in full-text document retrieval, where term redundancy 
is normal, and thus term frequency would be an important feature. Since a 
tag generally appears only once for each picture, term frequency as a feature 
has generally no impact on the classification accuracy. For this reason, in our 
experiments, our set of term features does not include term frequency but other 
traditional statistical features such as document frequency (DF) [34]. As part 
of evaluating our approach, we will use the term features in implementing the 
baseline approach for our experiments. This will also allow us to assess how 
well the features suggested in this work improve the retrieval performance. 

Temporal Features (3^): Once again, since our focus is on event retrieval, 
we are interested in capturing how each term in image tags contributes to char¬ 
acterising the images over time periods. Therefore, we need a set of statistical 
features that represent the temporal distribution of the term in the whole collec¬ 
tion. Here, we propose single term features and term-to-term features related to 
the temporal correlation of the candidate expansion term and the query terms. 
More specifically, to capture the characteristics of the temporal distribution of 
a single term, we adopt the concept of kurtosis defined as /i 4 //i|, where pL is the 
mean and pLj is the j-th central moment. Kurtosis were originally proposed by 
Jones and Diaz m to capture the dynamics of a time series. It can be used 
to quantify the probability distribution concentrated in peaks of a time series - 
i.e., the ’’peakedness”. In this work, we propose to measure the peakedness for 
both a single candidate expansion term e {KURTl), and the combination of a 
candidate expansion term e with a term qi from the original query (KURT12). 

In addition to this, we are interested in knowing about the randomness 
of terms over time. A way to detect such a randomness is to use autocorre¬ 
lation |H5| . In general, autocorrelation is computed by finding the statistical 
correlation between two values of the same variable at a given time ti and an¬ 
other time ti+m- Such values can, for example, be the number of occurrences of 
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Feature Description 

Term Features 


DF0{e) 

DFl(e) 

DF2[e) 

DFsle) 

CoOccSingle{e) 

CoOccPair{e) 


Raw document frequency. 

Inverse document frequency: log{N/DFO). 

Inverse document frequency smooth: log(l + N/DFO). 
Probabilistic inverse document frequency: log((A'^ — DF0)/DF0). 
Co-occurrence with single query terms: Iog( ^*=^ )i n- = |Q|- 


Co-occurrence with pairs of query terms: log( 

n= \Q\. 




Temporal Features 


KURT{e), 
KURT{Q + e) 

ACie), 

AC{Q + e) 

CC{Q,e) 


The kurtosis value of the time series for the pictures annotated with 
an expansion term e, and for the pictures annotated with both an 
expansion term e and a query Q, respectively. 

The autocorrelation value of the time series for the pictures anno¬ 
tated with an expansion term e, and for the pictures annotated with 
both an expansion term e and a query Q, respectively. 

The maximum cross-correlation between the time series for the pic¬ 
tures annotated with an expansion term e and the time series for 
the pictures annotated with a query Q. 


Spatial Features 




Max 

Max 


Max 


(e), 

(e+Q) 

(e, Q) 


Su?n(e ), 

Sum{e Q) 


Sum{e^ Q') 


The vector of the values from the gMax function, related to the D- 
function of the spatial point patterns for pictures annotated with a 
candidate expansion term e, and with both e and Q, respectively. 
The vector of the values from the gMax function, related to the 
cross-D-function between tag point patterns associated to e and 
the tag point patterns of Q. 

The vector of the values from the gsum function, related to the 
D-function of the spatial point patterns for the pictures annotated 
with a candidate expansion term e, and with both the terms e and 
Q, respectively. 

The vector of the values from the gsum function, related to the 
cross-D-function between tag point patterns associated to e and 
the tag point patterns of Q. 


Table 1: A Summary of the Set of Features 

a term e at specific times. The hypothesis is that bursty events in a time series 
normally contribute to a high autocorrelation value [21) . To capture this, we 
compute the first order autocorrelation of a time series for both a single can¬ 
didate expansion term e {ACl), and the combination of a candidate expansion 
term e with a term qi from the original query (AC12). Finally, to measure 
the temporal similarity between the time series of two different terms qi and e, 
we can apply the cross-correlation measure {CC) [Ml [Ml- Cross-correlation is 
computed by assessing the correlation of the frequency of qi and e to measure 
the relationship between qi and e. To compute the temporal features, we varied 
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the time windows or bins from one day to seven days, with which seven days 
gave the best results. To summarize, we investigate how combining previously 
proposed temporal features would affect the retrieval performance. These have 
proven successful in other more general information retrieval approaches, but 
the way we analyse the effects of their combination within event-related image 
retrieval haven’t been done before. 


Spatial Features (Z): As explained in Section the concept of event is 
strongly related to the spatial dimension - i.e., geographical location. We hy¬ 
pothesize that a good expansion term is spatially correlated with at least one 
of the query terms. This is the main reason we study the impact of clustering 
tendency, with respect to the spatial distribution for the pictures annotated 
with the candidate expansion terms. As part of this, we compute the spatial 
features as presented in Section [4.1| For each pair of terms e and qi, we first ex¬ 
tract the set of geographical world tiles Tq-^e containing spatial points related to 
documents annotated with spatial points associated to documents annotated 
with e, and those related to documents annotated with both qi and e. Next, we 
extract a set of six spatial feature vectors from each tile Tq^^e- The first three 
feature vectors are the vectors computed using gMax - i-S., the relative discrete 
maximum distance function for the specified tag point pattern (see Eq. 11), 
consisting of ~l^Max{e), Z^Max(e -|- Q), i^Max{e, Q). The second set of feature 
vectors are based on gsum ~ he., the relative discrete positive area function (see 
Eq. 101, consisting of sum{e), ^Siim(e+Q), ^Sum.{e, Q). For all the extracted 
features, we compute the values of the functions by varying the distance values 
from 0 to 1 km, with a step of 0.1 km. Finally, for both the resulting first and 
second derivative of gMax and gsum, we perform similar operations as described 
Note that as can be inferred from this, the input query Q used 


in Section 4.2 


to extract the features may have varying dimensions. However, this does not 
cause problem but may only affect the number of spatial points used to build 
the feature vectors, which is, according to Eq. i-m implicitly decided by the 
value of the distance scale h. 

In this work, we study the impacts of with these features combining the 
temporal features. In Section we analyse their usefulness and importance 
with respect to improving the retrieval performance. 


5.2.2. Combining the Spatial Features using the World Dataset 

Our dataset has been built from a collection of Flickr pictures covering the 
whole world map. For this reason, the spatial distribution of the pictures is 
not uniform. To cope with this, we divide the entire world map into a number 
of tiles. More specifically, we divide the world map into grids with size of 
one latitude degree and one longitude degree. We span the latitude in the 
range of [—180,..., -1-180] degrees, while the longitude in [—70,..., -1-70], instead 
of [—90,...,-1-90] degrees to avoid the Arctic and Antarctic areas, since these 
areas have normally poor photographic activity. The width of each tile for each 
(or one) degree of latitude is constant, and has a size of 111 km, while following 
the latitude values, the tile heights vary from around 0 at the poles to around 
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Ill km at the equator. This would give us in total 50,400 tiles. However, to 
restrict the computation cost, we only consider tiles containing a significant 
number of pictures - i.e., more than 1,000 pictures. Let such tiles be significant 
tiles, denoted by T. 

With this in mind, we extract the spatial feature vectors for a pair of terms 
Wi and Wj - e.g., a query term qi and an expansion term e, as follows. First, 
let Ti = {7ii,7i2 • ■ • be the set of N tiles containing pictures tagged with 

Wi, and Tj = be the set of M tiles containing pictures tagged 

with Wj. Then, to find a significant tile, Tij € T, containing pictures tagged 
with both Wi and Wj, we merge the two sets % and Tj. Finally, to get the 
feature vectors, for each tile, Tj, we compute the bivariate Zl-function and the 
corresponding estimators for the tag point patterns for both Wi and Wj (see 
Section . 

To have a data structure allowing efficient feature extraction operations, we 
index each tile 7; as a document composed by the set Wy; = {rcq,. •. ,w^.j.^^} 
of all tags annotating the pictures from each tile 71. To do this, we create an 
inverted index, I, for each tag wi- S , which we can formulate formally as 
follows: 


I: {w^ —)■ {< T^,tfri^{wi) >,<Ti2,tfri^{wi) >,...}}i (16) 

This means that each tag is linked to an inverted list containing the id of the 
tile and the term frequency, (wi), of a tag, Wi. 

Selection of the Tiles for Spatial Features Extraction. To select the tiles 
for spatial feature extraction, we are mainly interested in the tiles containing 
pictures that are annotated with both at least one term in Q and a candidate 
expansion term e. However, to make the spatial features suitable for our classi¬ 
fier, we select only one tile that is most representative to a specific input query 
Q. We call this the best tile. To do this, we first run Q on our dataset. Then, 
we select the first K geotagged pictures from the resulting ranked list. Finally, 
we select the tile containing the highest TF-IDF-based ranking score. For sim¬ 
plicity, by treating tiles as documents, we index and search them using g o/rp] 
search platform. Thus, the resulting list of tiles is ranked using Lucene scorej^ 

5.2.3. Query Re-weighting Process 

We now explain how we perform the re-weighting process using the sets of 
features presented in the previous sections. 

Figure shows a part of the term selection and re-weighting process. As 
depicted in the figure, the Temporal Classifier is trained with positive and 
negative examples using only term and temporal features, while the Spatio- 
Temporal Classifier is trained with instances using the complete set of features. 
Thus, given a query term qi and the candidate expansion term e, we first ex¬ 
tract the complete set of features. Thereafter, the input instances are classified 


® http://lucene.apache.org/solr/ 

^''http: //lucene. apache. org/core/3_6_2/scor ing.html 
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Expansion term [e] 
S. Query (qj) 



V- » gpg|-jg| features 

Feature Extraction 


Final 

confidence 

value 


Figure 4: Good expansion term selection process through classification 

with both of the classifiers. Finally, a Final Score Selection module designates 
the final confidence value. By default, our system produces scores based on 
all three feature sets. However, we might have a situation in which we do 
not have geo-tagged pictures that are annotated with both e and any qi G Q. 
Thus, producing the spatial feature vectors from the functions gMax(,Q + e) and 
gSum{Q~\- e) would be hard. If this happens, then the final score from the Final 
Score Selection module is based on the term and temporal features only. 

We call the final confidence score for good candidate expansion terms from 
our expansion term selection process Con/(+|e). To produce the final Kullback- 
Leibler (KL) score for the query expansion process, we combine Con/(-|-|e) with 
the term-based KL-score as follows: 

KLpinaiie) = aKL{e) -I- (1 - a)Conf{+\e). (17) 

Note that to allow this combination, both Conf{+\e) and KL{e) values are 
normalized. Here, a is a constant used to decide which component should have 
the highest contribution. That is, a = 1 means that we only apply the regular 
KL-divergence, whereas a = 0 means we rely entirely on the classification mod¬ 
ules to choose good expansion terms. Since we are interested in the impacts of 
our expansion terms to the retrieval performance, we let both components to 
have equal contributions to the final score - i.e., we use a = 0.5. 

The confidence value Confl+je) is computed based on the idea that both the 
temporal and the spatio-temporal classifiers give their contributions exploring 
terms over different dimensions, and that they complement, rather than extend 
each other. With this in mind, Conf(+je) can be computed as follows: 

{ 0, if Confr < 0.5 and Confsr < 0.5 

Confr, if Confr > 0.5 and Confsr < 0.5 

(18) 

Confsr, if Confr <0.5 and Confsr >0.5 

Confr+CoufsT ^ if Confr > 0.5 and Confsr > 0.5 

Here, Confri+le) and Confsri+le) are the confidence values from the Tem¬ 
poral Classifier and Spatio-Temporal Classifier, respectively. The choice of the 
value 0.5 as threshold is made based on the fact that 0 < Conf{+\e) < 1 and 
that we aim at having final confidence values higher than half the highest pos¬ 
sible value. Our experimental results have shown that this is a sensible choice. 
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6. Experimental Setup 

In this section we present our dataset and the methodology for our experi¬ 
mental evaluation. 

6.1. Dataset 

To perform our experiments with tag-based search of event retrieval pictures 
and to check the feasibility of our approach, we use a large dataset of pictures 
gathered from Flickip^ covering a time period from 01.01.2006 to 31.12.2010 
and without spatial restrictions. This results in a final dataset consisting of 
88,257,485 pictures, of which 18,861,585 pictures are without any tags and 
around 23.5% are with 1 to 3 tags. For relevance judgement we apply the 
well-established Upcoming dataset |37j as our ground truth. It has also been 
used previously in other related approaches [38]. Specifically, the Upcoming 
dataset consists of 270,425 pictures from Flickr, taken between 01.01.2006 and 
31.12.2008, each of which belongs to a specific event from the Upcoming event 
databasj^ The unique number of events are 9,515. Each event is composed 
by a variable number of images, varying from 1 to 2,398 pictures. This large 
number and the heterogeneity of the included events are the main advantage of 
the Upcoming dataset, and the main reason we decided to use it. For generality, 
we merged the Upcoming dataset with the set of other Flickr pictures. 

To perform our experiments, we indexed all image tags using Terrieip^ As 
part of the dataset preparation, we perform a preprocessing step consisting of 
tokenization based on whitespace and punctuation marks; stemming, by using 
the Porter stemmer algorithm [39]; and English stopword removal. 

6.2. Evaluation Methodology 

In this section, we briefly explain how we evaluate our approach. First, we 
present our input query set. Second, we discuss the methods we used as baseline 
for our experiments. Third, we elaborate on the evaluation metrics we applied. 

6.2.1. Input Query Set 

We randomly selected set of 150 pictures, one for each event cluster in the 
Upcoming dataset, and use the tags annotating the pictures as queries. We 
divide this set of queries into two subsets, one subset consisting of 100 queries 
that we use to train and evaluate the performance of the classifiers, and another 
subset consisting of the remaining 50 queries that are used as the test set to 
evaluate the retrieval effectiveness of the proposed retrieval framework. For 
completeness, in Table [^ we show some example of input queries used in our 
experiments. 


used Flickr API, http://www.flickr.coin/servic6s/api/ 
^^See http://www.cs.Columbia.edu/~hila/wsdm-data.html 
^^See http://www.terrier.org/ 
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Query 


Event Description 



hammermuseum, 
weswood, ioecho 


Concert of "The Duke Spirif band at UCLA 
Hammer Museum, 17th of July, 2008 


Coachella Valley Music and Arts Festival, 
26th of April, 2008 


Download 2008 music festival at Gibson 
Amphitheatre, Los Angeles, 20th of July, 2008 


gibsonamphitheatre, 

universalcitywalk 

Table 2: Example of queries extracted from the Upcoming dataset. 

6.2.2. Baseline Methods 

To assess the effectiveness of the retrieval framework, we compare our mod¬ 
els with several baseline methods. First, we perform the searching process by 
using classical retrieval models, including the Vector Space Model (VSM), Okapi 
BM25 (BM25) [10] j and the Language Model (LM) for information retrieval - 
with Jelineck and Dirichlet smoothing. Since BM25 gave the best results in 
term of effectiveness, we only show the results related to this model. We use 
the default parameter values fci = 1.2, fca = 8 and h = 0.75 as baseline for our 
evaluation. As a query expansion model, we use the basic KL-divergence model 
{KL) and the machine learning approach with the baseline features as proposed 
Lin et al. [TS| as baseline {KLML). For simplicity and readability, we only show 
the results of KL since we observed that the MAP values of KLML are compa¬ 
rable with the MAP values of KL. We compare the baseline approaches with our 
proposed methods, first by comparing them with a query expansion framework 
applying a classifier trained with the combination of terms and temporal features 
(KL_T); and then a framework with a classifier learned with the combination 
of terms, temporal, and spatial features (KL^ST). Note that in addition to the 
above models, we also experimented with the Mixture Model m and the Rel¬ 
evance Model [To], also incorporating the feedback documents in the ranking 
score computation. However, the results from these experiments were, though 
comparable, worse than those from the BM25+KL query expansion models. 
Thus, for simplicity we did not include the results from these experiments. 



spiritualized, coachella 
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6.2.3. Comparison with Related Work 

To have a fair comparison with similar approaches, we implemented the geo¬ 
temporal tag relatedness by Zhang et al. [23, which we, from now on, refer to as 
ZKYC\25f for simplicity. As with our approach, with ZKYC^E\ the similarity 
between two tags is computed by comparing their temporal and geographical 
distributions with so-called geo-spatial^ temporal, and geo-temporal similarity 
measures. First, they quantize the world map (space) into m tiles of 1 degree, 
and the time into n temporal bins of two weeks. Then, they extract the tag 
features based on the three measures using vectors of numbers of users applying 
a tag in each bin. This means that for a specific tag the geo-spatial feature 
vector contains m elements of numbers of users applying that tag in each bin; 
the temporal feature vector contains n elements of numbers of users applying 
the tag in each bin; and the geo-temporal feature vector or matrix contains 
m X n elements of the counts of unique users tagging a picture within the geo¬ 
temporal bin. All vectors are normalized with /^-norm. Zhang et al. [25] get 
the similarities between two tags by computing the euclidean distance between 
the two corresponding feature vectors. 

As can be inferred from this, the main difference between our approach 
and ZKYC|23 is the geographical and geo-temporal features used and how 
they are extracted. Specifically, with ZKYC[25]. the geographical feature vector 
related to a tag is static, and representing the distribution of the tag over a 
single size of bin; whereas basing our approach on the Ripley AT-function enable 
characterizing the geographical distribution of tags over non-fixed geographical 
scales. As discussed previously, the Ripley AT-function also allow us to extract 
the clustering properties of tags. In our experiments, we pay special attention to 
how this difference affects the retrieval performance. Specifically, we consider 
a range between 0 km and 3 km of scales when computing the A'-function. 
According to Zhang et al. [13 ; the geo-temporal features yielded the best results. 
For this reason, we only compare our approach using with the one applying the 
geo-temporal features. To incorporate this relatedness in a retrieval framework 
and compare it with our approach, we define a ranking equation equivalent 
to Eq. 17 as aKL -|- (1 — a)r el geo-temp, where a is a constant deciding the 


contribution of the components, KL is the KL-score and relgeo-temp denotes 
the geo-temporal tag relatedness score. We tune and select the best value of the 
parameter a over a set of 50 queries. 


6 . 2 . 4 . Evaluation Metrics 

To evaluate the retrieval performance of all the models, we use Mean Av¬ 
erage Precision (MAP), a widely used evaluation metric within information 
retrieval [H]- We compute our MAP values based on 1,000 retrieved documents 
(images). To make sure that any improvements are statistically significant, we 
perform paired two-sample one-tailed t-tests at p < 0.05 or 95 % confidence 
interval. Any stated improvements in this paper are all statistically significant, 
unless otherwise specified. 
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7. Results 


In this section we perform two different analyses. First, we study the impact 
of using our temporal and spatial features on training classification algorithms. 
Second, we investigate the effectiveness of using the temporal and spatial fea¬ 
tures in a classifier with an optimal feature selection procedure. 

7.1. Classification Accuracy 

As part of the process of designing a good classifier for selection of good 
expansion terms, we investigated which classifier is suitable for our application. 
We evaluated several existing classifiers with respect to their classification accu¬ 
racy, and selected the classifier yielding the best accuracy. Specifically, we tested 
our method using Naive Bayes classifier. Support Vector Machine {SVM), C4.5 
decision tree (also named J48) and Random Forest. We used Weka [41] ma¬ 
chine learning toolkit with default parameter settings to test the classifiers. The 
training set was composed by a set of 1,000 terms, equally divided into good 
and bad terms. These were obtained by randomly selecting the feedback terms 
from the results of running the queries using the training set. 

To perform a thorough evaluation, we calculated the accuracy, precision 
and recall values for each classifier, with a leave-one-out cross validation. We 
performed the test for five different training sets that we obtained by selecting 
a positive and a negative class using different values of threshold 9 (see also 
Section]^. The 9 values we selected were 0.001, 0.005, 0.01, 0.05, 0.1, and 0.5. 

We summarize the averaged results in Table Here, ” -f” is the positive 
class containing the good candidate terms, whereas ” denotes the negative 
class holding terms considered bad expansion terms. From these results, we 
can observe that the general performances of the classifiers are good using the 
proposed set of features. The overall best result was gained by using Random 
Forest classifier, with an accuracy of around 95%. The precision for the clas¬ 
sification of the good terms was 93% and the recall was as high as 97.56%. 




Precision 

Recall 


Accuracy (%) 

-t 

- 

4- 

- 

Naive Bayes 

59.12 

0.6102 

0.6092 

0.6180 

0.5644 

SVM 

69.22 

0.6802 

0.7072 

0.7288 

0.6556 

C4.5/J48 

91.52 

0.8870 

0.9490 

0.9536 

0.8768 

Random Forest 

94.98 

0.9288 

0.9736 

0.9756 

0.9240 


Table 3: Comparison of the classification performances. The best scores in each column are 
type-set boldface. 

We now analyse the behaviour of the accuracy value of the four proposed 
classifier over the different 9 values. The results is summarized in Figure 
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Here, we can observe that the higher the threshold value is, the more the ac¬ 
curacy of the classifier increases. Moreover, both J48 and Random Forest (RF) 
outperformed the Naive Bayes (NB) and SVM, with high margin. 



Figure 5: Accuracy of the four different classifiers over the different values of 6 

There are several factors that may affect the performance of classihcation 
algorithms, which can also be used to explain this. These include randomness 
and sparsity of the actual dataset, the probability of noises and outliers, the size 
of the dataset, and the number of independent features - i.e., dimensionality. 
In addition, many algorithms need calibration to perform well |42j . Focusing 
on our experiments, the results showed that tree-based classification approaches 
work best, of which Random Forest is the best classifier. This is because we 
experimented with a large dataset that has a high degree of randomness and 
a high number of independent features. This conclusion is also supported by 
results from other studies [42l |43]. Moreover, the fact that we applied the 
classifiers with default parameters, with no tuning, played an important role. 
Since we focus on the ability to treat the classifiers as a ” black-box”, squeezing 
out every bit of performance by tweaking the classifiers’ parameters is beyond 
the scope of this work. In conclusion, we choose Random Forest as the base 
classifier for our framework. 

1.2. Retrieval Effectiveness Comparison 

As part of our evaluation, we performed a comparative study on the retrieval 
performance. We compared our approaches with the baseline methods by ex¬ 
ecuting a standard retrieval model - i.e., the BM25, and applying the query 
expansion models described in the previous section - i.e., Kullback-Leibler (KL) 
divergence, in combination with both the temporal features {KL_T) and with 
the spatio-temporal features (KL^ST). More specifically, we used the Rocchio’s 
framework weighting model, with both the KL divergence model to choose the 
expansion terms. For each query expansion run, we used the default value of 
f} from [35] - i.e., f3 = 0.4, and chose the first n terms of the top-fc documents 
for the Rocchio’s Beta weighting model. The numbers of pseudo relevant docu¬ 
ments, k, were set to 20, 40, 60, 80, 100, and 120, and the numbers of selected 
terms, n, were 15, 25, 35, 45, and 55. Finally, we performed the query expansion 
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Baseline 

Related method 

Our approaches 

#Doc 

#Term 

\BM25 

KL 

KL+ZKYC[25] 

KL T 

KL ST 

20 

15 

[0.4448 

0.4601 

j 0.4614 j 

0.4752^2 

0.4816^2 


25 

0.4448 

0.4605 

0.4626 

0.4755^^ 

0.4838^^® 


35 

0.4448 

0.4618 

0.4638 

0.4761^^ 

0.4838^2® 


45 

0.4448 

0.4618 

0.4634 

0.4764^2 

0.4833^2® 


55 

10.4448 

0.4618 

] 0.4624 1 

0.4761^^ 

0.4838^^® 

40 

15 

[0.4448 

0.4708 

[ 0.4744 j 

0.4786^^ 

0.4870^^® 


25 

0.4448 

0.4714 

0.4738 

0.4799^^ 

0.4880^^® 


35 

0.4448 

0.4705 

0.4734 

0.4813^^ 

0.4885^^® 


45 

0.4448 

0.4726 

0.4757 

0.4843^^ 

0.4918^^® 


55 

10.4448 

0.4717 

1 0.4745 I 

0.4827^^ 

0.4913^^® 

60 

15 

[0.4448 

0.4665 

i 0.4674 1 

0.4816^^ 

0.4848^^ 


25 

0.4448 

0.4685 

0.4696 

0.4818^^ 

0.4909^^® 


35 

0.4448 

0.4704 

0.4733 

0.4856^^ 

0.4951^^® 


45 

0.4448 

0.4712 

0.4721 

0.4877^^ 

0.4957^^® 


55 

10.4448 

0.4703 

1 0.4731 1 

0.4867^2 

0.4957^23 

80 

15 

[0.4448 

0.4697 

1 0.4706 1 

0.4803^^ 

0.4894^2® 


25 

0.4448 

0.4699 

0.4711 

0.4847^^ 

0.4935^^® 


35 

0.4448 

0.4718 

0.4733 

0.4862^^ 

0.4951^^® 


45 

0.4448 

0.4712 

0.4731 

0.4884^^ 

0.4979^^® 


55 

10.4448 

0.4719 

1 0.4741 1 

0.4890^^ 

0.5001^^® 

100 

15 

[0.4448 

0.4613 

1 0.4621 j 

0.4701^^ 

0.4727^^ 


25 

0.4448 

0.4611 

0.4619 

0.4755^^ 

0.4802^2 


35 

0.4448 

0.4634 

0.4642 

0.4781^^ 

0.4849^^® 


45 

0.4448 

0.4613 

0.4621 

0.4803^^ 

0.4879^^® 


55 

10.4448 

0.4621 

[ 0.4631 1 

0.4820^^ 

0.4891^^® 

120 

15 

[0.4448 

0.4592 

[ 0.4601 j 

0.4681^^ 

0.4709^^ 


25 

0.4448 

0.4589 

0.4610 

0.4769^^ 

0.4814^^ 


35 

0.4448 

0.4606 

0.4625 

0.4774^2 

0.4870^^® 


45 

0.4448 

0.4589 

0.4606 

0.4829^^ 

0.4899^^® 


55 

10.4448 

0.4595 

j 0.4606 1 

0.4845^^ 

0.4914^^® 


Table 4: MAP Comparison between baseline QE (KL) and QE with classifier learned with 
baseline+temporal features (KL_T) and baseline+temporal+spatial features (KL.ST). The 
best scores within each row and each group are type-set boldface. The numbers 1,2,3 in the 
superscript in the table indicates statistical significance improvements with respect to KL, 
KL+ZKYC^, KL_T, respectively. 


baseline model based on the geo-temporal tag similarities as proposed by Zhang 
et al. 


- i.e. 


the ZKYC^^ discussed in Section 6.2.3 


Table 1^ lists the results from our experiments. As shown, the baseline query 
expansion method is better than the baseline BM25 in all of our tests, with the 
best MAP improvement of 6.2%. We can also see that ranking the feedback tags 
using KL and ZKYC^F>\ to select query expansion terms does not significantly 
improve the effectiveness of the ranking score of KL. This is mainly because 
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ZKYC^^ captures the feature of a tag distribution on a fixed geo-temporal 
scale, due to the size of the geo-temporal bin. 

Overall, both our proposed query expansion methods outperform both of the 
baseline methods, for all the combinations of number of documents and number 
of terms. For KL_T, the maximum MAP improvement is 9.7%, while for KL_ST, 
the improvement is 12.4%. Moreover, studying the MAP values, both KL_T 
and KL^ST outperform ZKYC^I^. As discussed in Section 6.2.3[ an important 
difference between our method and ZAFlTpH] is the property of geo-temporal 
attractiveness of terms with respect to scales. Recall that with ZAFC[25], the 
geo-temporal features are extracted using a fixed scale. In contrast, our methods 
allow extracting the features at different scales, and take spatial attractiveness 
into account (see Section . Because the concentration of pictures normally 
vary both in time and space, considering spatial attractiveness and scales is 
important. The above results further confirm this importance. In conclusion, 
the ability to capture the geo-temporal attractivenesses of terms at different 
geo-temporal scales leads to improved retrieval performance. 



# Docs 

(a) Average MAP Imporvements 



# Docs 

(b) Maximum MAP Imporvements 


Figure 6: Comparison of MAP improvements as function of feedback documents 

In Figure we summarize the improvements of MAP compared to BM25, 
while varying the numbers of feedback documents (k). Specifically, in Figure]^, 
for each method, we first take the average MAP values for different numbers 
of terms. Then, we plot the values as function of the number of feedback 
documents. In Figure]^, on the other hand, we plot the best MAP values for 
each method by only taking into account the numbers of feedback documents, 
independent of the number of terms. As we can observe in both graphs, all the 
four query expansion methods have similar trends; that is, the search process 
using each method gains benefits from the query expansion until reaching a 
specific number of documents - i.e., a breakpoint, and thereafter this benefit 
decreases. However, the breakpoint for both of our two approaches {KL_T and 
KL^ST) is much higher than with both the baseline KL and ZKYC^E\ - i.e., 
80 versus 40. The reason for this is that with baseline KL, the set of candidate 
expansion terms are explored by considering only document features, which 
seems to be too restrictive. Moreover, ZKYC^^ does not consider spatial 
attractiveness and variation in scales. 


26 













1.3. Analysis of the Features 

In this section, we analyse the effectiveness the temporal and spatial features 
we have used to learn the classifiers for selection of the expansion terms. The 
question we want answered is: Do the features we have proposed in this pa¬ 
per contribute to improve the classification accuracy, and which features work 
best? To ensure comprehensiveness, we perform our analyses using three differ¬ 
ent widely-used correlation-based feature evaluation methods. More specifically, 
we use Information Gain (IG) [H], Gain Ratio (GR) [IS] and Symmetrical Un¬ 
certainly {SU) di]. Information Gain is given by IG{C,F) = 'H(C') — T-L{C\F), 
where %{€!) is the entropy of a class C and 'H{C\F) is the entropy of the class, 
given a feature F. Gain Ratio is the direct extension of Information Gain, 
which is GR{C,F) = IG{C,F)/T-L{C). Symmetrical Uncertainly (SU) evaluates 
the goodness of a subset of features F by comparing its symmetrical uncertainty 
with another subset of features [35]. Let Fsubi C F and Fsub^ C F such two 
subsets. Then, SU{Fsubi, Fsub 2 ) = IG{Fsubi_,Fsub 2 )l{T~L{Fsubi)+T-L{Fsub 2 ))- 
As before, we use Weka to implement of the feature selection methods. 

Table and report the IG, GR and SU scores, respectively, for the fea¬ 
tures we used in our classification of good and bad expansion terms. They show 
which features are the best using the baseline and temporal features compared 
with applying baseline, temporal and spatial features. 


Baseline+Temporal 

Baseline+Spatial+Temporal 

Feature 

IG Score 1 Feature 

IG Score 

coOccSinglewhoie 

0.104 

AC2 

0.204 

CC 

0.066 

RDPA2second[i] 

0.106 

KURT12 

0.065 

RDMD12[2] 

0.097 

AC12 

0.046 

RDPA12[3] 

0.088 

^ Feedback 

0.035 

RDMD1211] 

0.086 

CoOccS'iTlQlsp'e.edback 

0.034 

RDMD12[3] 

0.086 

CoOccP Qjiv JP eedback 

0.031 

RDPA12Fir3t[3] 

0.080 

D FO F eedback 

0.029 

RDPA12[1] 

0.074 

F F \ F eedback 

0.025 

F F"^ F eedback 

0.064 

FFQi Feedback 

0.025 

RDMD12second[I\ 

0.063 

DF2,\Yhole 

0.021 

RDP [1] 

0.062 

DFOwhole 

0.021 

RDPA12[2] 

0.062 

DFlwhole 

0.021 

RDMD12[A\ 

0.056 

DFSwhole 

0.021 

KURT2 

0.048 

coOccP air Whole 

0.021 

coOccSinglcwhoie 

0.048 

KURT! 

0.000 



ACl 

0.000 




Table 5: Comparison of the feature quality based on Information Gain. RDMDs are the 
features related to the relative discrete maximum distance feature vectors. RDPAs are the 
features related to the relative discrete positive area vectors. ”L” means that we use a cross- 
L (or cross-D) function. ”First” and ’’Second” stand for first and second order feature, 
respectively, ’’[number]” denotes the number, fc, of intervals used to compute the L (or D) 
function (see Section!^. 
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Baseline+Temporal 

Baseline+Spatial+Temporal 

Feature 

RG Score Feature 

RG Score 

KURT12 

0.104| 

I RDMD12second [4] 

0.157 

AC12 

0.086 

i?DPA12Fir.t[4] 

0.116 

CoO CcS tTiglc Feedback 

0.079 

RDPA12second [3] 

0.111 

coOccSinglewhoie 

0.056 

CC 

0.109 

CC 

0.054 

\RDMD12pirst[i] 

0.109 

D FO F eedback 

0.040 

\rDPA12[3] 

0.107 

FF3 Feedback 

0.040 

RDPA12Second [4] 

0.107 

FF\Feedback 

0.036 

\rDMD12[3] 

0.105 

F) F^, Feedback 

0.036 

RDM DL12second\2] 

0.098 

CoOccP div Feedback 

0.031 

\RDMD12second]^] 

0.093 

coOccPairwhoie 

0.025 

\rDMD12[2] 

0.093 

DFSwhole 

0.025 

\rDM DL12second\2] 

0.088 

DF2,\Yhole 

0.025 

\rDMD12[1] 

0.085 

F)F0\Y}iQie 

0.025 

\rDMDL12\A] 

0.078 

DFlwhole 

0.025 

\rDPAL12[2] 

0.078 

KURTl 

0.000 



ACl 

0.0001 




Table 6: Comparison of the feature quality based on Gain Ration. RDMDs are the features 
related to the relative discrete maximum distance feature vectors. RDPAs are the features 
related to the relative discrete positive area vectors. ”L” means that we use a cross-L (or 
cross-D) function. ’’First'’ and ’’Second” stand for first and second order feature, respectively, 
’’[number]” denotes the number, k, of intervals hj. used to compute the L (or D) function (see 
Section]^. 


Focusing on the baseline and temporal features, these results show that with 
all the three feature selection methods - i.e., IG, GR and SU, none of the features 
related to the temporal autocorrelation {ACl) and kurtosis {KURTl) have any 
impact on the classification. This means that the information about peaks in the 
temporal distribution of candidate expansion terms does not seem to have any 
effects on determining good candidate expansion terms. However, the temporal 
correlation between the distribution of documents annotated with a candidate 
expansion term and of those annotated with a term from the initial query - i.e., 
AC12, KURT12 and AC12, seem important, as their scores are within the top-5 
highest scores. Similar observation can be made on the cross-correlation ~ i.e., 
CC, between the time series of a candidate expansion term and a query term. 

Focusing on our set of features - i.e, the baseline, temporal and spatial fea¬ 
tures, on the other hand, our observation is that with all the three feature 
selection methods, the most important features are those related to the vec¬ 
tors ^ Max{Q,e) (called RDMDL12 in Table [^and[^, Max{Q + e) (or 
RDMD12), Sum{Q, e) (called RDPAL12 in Tableandand l^sum{Q + 
e) (or RDPAL12). This means that the features related to the spatial distribu¬ 
tions of the documents annotated with both the candidate expansion terms and 
the query terms, and the spatial correlation between the two tag point patterns 
have a strong impact on the classification results. As a conclusion, our analysis 
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Baseline+Temporal 

Baseline+Spatial+Temporal 

Feature 

SU Score 1 Feature 

SU Score 

KURT12 

0.080 

AC2 

0.107 

coOccSinglewhoie 

0.073 

RDPA12[3] 

0.096 

AC12 

0.060 

RDMD12[2] 

0.095 

CC 

0.059 

RDMD1213] 

0.094 

CoOccSzTIqIGp' eedback 

0.048 

RDMD12[A] 

0.067 

Feedback 

0.037 

RDP A12Fir3t[3] 

0.064 

DFO Feedback 

0.034 

RDPA12[2] 

0.063 

CoOccP Qjiv Feedback 

0.031 

RDPA12First[l] 

0.062 

FF\ Feedback 

0.029 

RDM D12 Second[l] 

0.057 

FF‘2 Feedback 

0.029 

RDPA2second[l] 

0.057 

DF‘2\Yhole 

0.023 

RDPA12[4:] 

0.054 

DFOwhole 

0.023 

RDPAL12First [2] 

0.054 

DFlwhole 

0.023 

F F"^ F eedback 

0.053 

DFSwhole 

0.023 

RDMD12\1\ 

0.052 

coOccPairwhoie 

0.023 

RDM D12First\3\ 

0.051 

KURT! 

0.000 



ACl 

0.000 




Table 7: Comparison of the feature quality based on Symmetrical Uncertainly. RDMDs are 
the features related to the relative discrete maximum distance feature vectors. RDPAs are 
the features related to the relative discrete positive area vectors. ”L” means that we use a 
cross-L (or cross-D) function. "First” and "Second" stand for first and second order feature, 
respectively. ”[fc]” denotes the number, k, of intervals hf^ used to compute the L (or D) 
function (see Section]^. 


confirms the importance of using the spatial correlations between a candidate 
expansion term and a query term as features for classification of good and bad 
candidate expansion terms. 


8. Conclusion 

In this work, we have developed a new approach to effectively retrieve event- 
based images from typical media sharing applications, such as Flickr. To achieve 
this, we have developed a new method using a new set of spatial features ex¬ 
tracted from image tags to capture the characteristics of the spatial distributions 
of such tags. This has included applying rigorous statistical exploratory analysis 
of spatial point patterns to extract the geo-spatial features. As we have shown in 
this paper, with these features, we have been able to both summarize the spatial 
characteristics of the spatial distribution of a single term, and identify the sim¬ 
ilarity between the spatial profiles of two terms. Further, aiming at improving 
the retrieval performance, we have investigated the gain of combining our geo¬ 
spatial features with a set of temporal features from the current state-of-the-art 
approaches within information retrieval. In addition, we have studied the useful¬ 
ness of our method by applying our features in a machine-learning-based query 
expansion framework. More specifically, we have used our spatial and temporal 
features to select of the best candidate terms for the query expansion process. 
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The originality of this work lies in the way we extract these features and how we 
use them to choose the best expansion terms. Our experiments and extensive 
analyses, including comparison against the baseline methods and existing work, 
have demonstrated the effectiveness of our approach. These have particularly 
shown the importance of our proposed spatial features and the feasibility of our 
approach. 

Nevertheless, there are interesting aspects of this work that we have left for 
further investigation. First, to further explore the usefulness of our spatial fea¬ 
tures in more general information retrieval settings, we currently study applying 
our approach on other resources than pictures. Second, in this paper, we have 
focused on selecting candidate expansion terms as a binary classification prob¬ 
lem. As part of making our approach even more generic, we are investigating 
performing unsupervised selection of expansion terms based on their associated 
temporal and geo-spatial characteristics. Third, we are exploring the combina¬ 
tion of this approach with Open Linked Data usage, such as DBPedia, to further 
improve the choice of best expansion term candidates. 
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